Predicting head motion from prosodic and linguistic features
نویسندگان
چکیده
This paper describes an approach to predict non-verbal cues from speech-related features. Our previous investigations of audiovisual speech showed that there are strong correlations between the two modalities. In this work we developed two models using different kinds of Recurrent Artificial Neural Networks: Elman and NARX, to predict parameters of activity for head motion using linguistic and prosodic inputs, and compared their performance. Prosodic inputs included F0 and intensity, while linguistic parameters included the former plus additional information such as the type of syllables, phrases, and different relations between them. Using speaker specific models for six subjects, performance measures in terms of root mean square error (RMSE) showed that there are significant differences between the models with respect to the input parameters, and that NARX network outperformed the Elman network on the prediction task.
منابع مشابه
A Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information
Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating them potentially can play an important role in transmitt...
متن کاملPredicting Prosodic Phrasing Using Linguistic Features
The prosodic structure of speech is based on complex interaction within and between several different levels of linguistic, and paralinguistic organization, and is expressed in the modulation of F0, intensity, duration, and voice quality, as well as the occurrence of pauses. Even though leading theories of prosody maintain that prosody is shaped through the interaction of grammatical factors fr...
متن کاملAn Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech
We analyze the temporal co-ordination between head gestures and prosodic patterns in spontaneous speech in a data-driven manner. For this study, we consider head motion and speech data from 24 subjects while they tell a fixed set of five stories. The head motion, captured using a motion capture system, is converted to Euler angles and translations in X, Y and Z-directions to represent head gest...
متن کاملAutomatic head gesture learning and synthesis from prosodic cues
We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous p...
متن کاملThe Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls
Call Centre data is typically collected by organizations and corporations in order to ensure the quality of service, supporting for example mining capabilities for monitoring customer satisfaction. In this work, we analyze the significance of various acoustic features extracted from customer-agents’ spoken interaction in predicting self-reported satisfaction by the customer. We also investigate...
متن کامل